A Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation
نویسندگان
چکیده
A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the Web, is quite significant compared to the duration of the PageRank computation. To alleviate this problem, we utilize the information that sites form a natural clustering on pages to propose a site-based hypergraph-partitioning technique, which does not degrade the quality of the parallelization. We also propose an efficient parallelization scheme for matrix-vector multiplies in order to avoid possible communication due to the pages without in-links. Experimental results on realistic datasets validate the effectiveness of the proposed models.
منابع مشابه
Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation
The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph par...
متن کاملWeb-Site-Based Partitioning Techniques for Reducing the Preprocessing Overhead before the Parallel PageRank Computations
The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. Due to the enormous size of the Web’s hyperlink structure, PageRank computations are usually carried out on parallel computers. Recently, a hypergraph-partitioning-based formulation for parallel sparse-matrix vector multiplication is propos...
متن کاملHypergraph Partitioning for Faster Parallel PageRank Computation
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a user-centred model of web-surfing behaviour. As the web has expanded and as demand for user-tailored web page ordering metrics has grown, scalable parallel computation ...
متن کاملAn Inner/Outer Stationary Iteration for Computing PageRank
We present a stationary iterative scheme for PageRank computation. The algorithm is based on a linear system formulation of the problem, uses inner/outer iterations, and amounts to a simple preconditioning technique. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Convergence analysis shows that the algorithm is effective for a crude inner tolera...
متن کاملParallel algorithms for hypergraph partitioning
Near-optimal decomposition is central to the efficient solution of numerous problems in scientific computing and computer-aided design. In particular, intelligent a priori partitioning of input data can greatly improve the runtime and scalability of large-scale parallel computations. Discrete data structures such as graphs and hypergraphs are used to formalise such partitioning problems, with h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006